2  Targets Pipeline Overview

The targets pipeline is used to automate and organize the data analysis workflow in this project. Below is an overview of the pipeline structure and its components.

2.1 Loading the Targets Package

if (!requireNamespace("targets", quietly = TRUE)) {
  install.packages("targets")
}

library(targets)

2.2 example targets workflow

# Define targets
list(
  # Target 1: Paths to annotation files
  tar_target(
    funct_annotations_path,
    here::here("data-raw/meat_ref_db/f3m_meat_genes_catalog_20241211_funct_annotations.tsv"),
    format = "file"
  ),
  tar_target(
    gtdb_classification_path,
    here::here("data-raw/meat_ref_db/meat_genes_catalog_gtdb_classification.tsv"),
    format = "file"
  ),

  # Target 2: Build reference database
  tar_target(
    meat_ref_db,
    build_ref_db(
      funct_annotations_path = funct_annotations_path,
      gtdb_classification_path = gtdb_classification_path
    )
  ),

  # Target 3: Path to folder with sample files
  tar_target(
    folder_path,
    here::here("data-raw/capfood"),
    format = "file"
  ),

  # Target 4: Import multiple sample counts
  tar_target(
    all_sample_counts,
    import_multiple_samples(folder_path = folder_path)
  ),

  # Target 5: Define aggregation levels
  tar_target(taxonomic_level, "genus"),
  tar_target(functional_level, "food_microbiome_metabolic_function"),

  # Target 6: Aggregate counts
  tar_target(
    aggregated_counts,
    aggregate_counts(
      all_sample_counts = all_sample_counts,
      ref_db = meat_ref_db,
      taxonomic_level = taxonomic_level,
      functional_level = functional_level,
      basal_categories = c("F3MA_RNA_metabolism",
    "F3MB_nucleotide_metabolism", "F3MD_DNA_metabolism")
    )
  ),

  # Target 7: Build count matrix
  tar_target(
    count_matrix,
    build_count_matrix(
      aggregated_counts = aggregated_counts,
      deseq2 = TRUE
    )
  ),

  # Target 8: Build basal metabolism matrix
  tar_target(
    basal_metabolism_matrix,
    build_basal_matrix(
      aggregated_counts = aggregated_counts
    )
  )

)

2.3 Pipeline Visualization

The following plot provides a visual representation of the pipeline dependencies. It shows how the targets are connected and the flow of data processing.

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
here() starts at /work_projet/synthplex/f3mr
ℹ Loading f3mr

2.4 Pipeline Summary

The pipeline consists of the following targets:

tar_manifest()
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
here() starts at /work_projet/synthplex/f3mr
ℹ Loading f3mr

2.4.1 Key Targets

  • Reference Database (ref_db): Builds the reference database for functional and taxonomic annotations.
  • All Sample Counts (all_sample_counts): Imports and combines all sample data.
  • Aggregated Counts (aggregated_counts): Aggregates counts at specified taxonomic and functional levels.
  • Count Matrix (count_matrix): Constructs a matrix for downstream analysis.

2.5 Run the Pipeline

To execute the entire pipeline, run the following command in R:

tar_make()

2.6 Further Exploration

Explore the output of specific targets:

# Load a specific target
target_output <- tar_read(count_matrix)
target_output[1:5,1:5]